Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks

نویسندگان

چکیده

Meta reinforcement learning (meta-RL) aims to learn a policy solving set of training tasks simultaneously and quickly adapting new tasks. It requires massive amounts data drawn from infer the common structure shared among Without heavy reward engineering, sparse rewards in long-horizon exacerbate problem sample efficiency meta-RL. Another challenge meta-RL is discrepancy difficulty level tasks, which might cause one easy task dominating thus preclude adaptation This work introduces novel objective function an action translator We theoretically verify that value transferred with can be close source our (approximately) upper bounds difference. propose combine context-based algorithms for better collection moreefficient exploration during meta-training. Our approach em-pirically improves performance ofmeta-RL on sparse-reward

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inter-Task Action Correlation for Reinforcement Learning Tasks

Introduction Reinforcement learning (RL) problems (Sutton & Barto 1998) are characterized by agents making decisions attempting to maximize total reward, which may be time delayed. RL problems contrast with classical planning problems in that agents do not know a priori how their actions will affect the world. RL differs from supervised learning because agents are never given training examples ...

متن کامل

Learning by Playing - Solving Sparse Reward Tasks from Scratch

We propose Scheduled Auxiliary Control (SACX), a new learning paradigm in the context of Reinforcement Learning (RL). SAC-X enables learning of complex behaviors – from scratch – in the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is...

متن کامل

Reward, Motivation, and Reinforcement Learning

There is substantial evidence that dopamine is involved in reward learning and appetitive conditioning. However, the major reinforcement learning-based theoretical models of classical conditioning (crudely, prediction learning) are actually based on rules designed to explain instrumental conditioning (action learning). Extensive anatomical, pharmacological, and psychological data, particularly ...

متن کامل

Compatible Reward Inverse Reinforcement Learning

PROBLEM • Inverse Reinforcement Learning (IRL) problem: recover a reward function explaining a set of expert’s demonstrations. • Advantages of IRL over Behavioral Cloning (BC): – Transferability of the reward. • Issues with some IRL methods: – How to build the features for the reward function? – How to select a reward function among all the optimal ones? – What if no access to the environment? ...

متن کامل

An Average - Reward Reinforcement Learning

Recently, there has been growing interest in average-reward reinforcement learning (ARL), an undiscounted optimality framework that is applicable to many diierent control tasks. ARL seeks to compute gain-optimal control policies that maximize the expected payoo per step. However, gain-optimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i6.20635